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ABSTRACT 


The ability to distribute cryptographic keys securely has been a challenge 
for centuries. The Diffie-Hellman key exchange protocol was the first practical 
solution to the key exchange dilemma. The Diffie-Hellman protocol allows two 
parties to exchange a secret key over unsecured communication channels 
without meeting in advance. The secret key can then be used in a symmetric 
encryption application, and the two parties can communicate securely. However, 
if the key exchange takes place in certain mathematical environments, the 
exchange becomes vulnerable to a specific man-in-the-middle attack, first 
observed by Vanstone [1]. We explore this man-in-the-middle attack, analyze 
countermeasures against the attack, and extend the attack to the multi-party 
setting. 
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I. INTRODUCTION 


The ability to communicate securely has been a challenge for millennia. 
For as long as people have tried to exchange private information, others have 
tried to compromise their privacy. In the modern communications environment, 
radio frequency communications and worldwide digital networks, such as the 
Internet, compound the problem. Both are susceptible to eavesdropping, often 
times trivially. By simply placing an antenna in the region of a radio frequency 
broadcast or tapping a wire anywhere between two nodes on a digital network, 
an uninvited third party can easily gain access to seemingly private 
correspondence. The field of cryptography—the practice and study of hiding 


information—has made enormous progress combating the eavesdropping threat. 


Cryptography and encryption/decryption methods fall into two broad 
categories: symmetric and public key. In symmetric cryptography, sometimes 
called classical cryptography, parties share the same encryption/decryption key. 
Therefore, before using a symmetric cryptography system, the users must 
somehow come to an agreement on a key to use. An obvious problem arises 
when the parties are separated by large distances, which is commonplace in 
today’s worldwide digital communications. If the parties did not meet prior to 
their separation, how do they agree on the common key to use in their crypto 
system without a secure channel? They could send a trusted courier to 
exchange keys, but that is not feasible, if time is a critical factor in their 


communication. 


The problem of securely distributing keys used in symmetric ciphers has 
challenged cryptographers for hundreds of years. If an unauthorized user gains 
access to the key, the cryptographic communication must be considered broken. 
Amazingly, in 1977, Whitfield Diffie and Martin Hellman published a paper in 
which they presented a key exchange protocol that provided the first practical 
solution to this dilemma. The protocol, named the Diffie-Hellman key exchange 


(or key agreement) protocol in their honor, allows two parties to derive a common 
1 


secret key by communications over an unsecured channel, while sharing no 
secret keying material a priori [2]. While Diffie and Hellman have received 
recognition for creating the protocol, it later emerged that the Government 
Communications Headquarters (GCHQ), a British intelligence agency, had 
independently invented a similar protocol a few years before Diffie and Hellman 
published their breakthrough paper. However, the British government classified 
their findings and the results were not released to the public until 1997 [3]. 


The Diffie-Hellman protocol relies on the difficulty of solving discrete 
logarithms in finite fields and the related intractability of the Diffie-Hellman 
problem. Due to the difficulty of solving these mathematical problems, an 
eavesdropper is unable to compute efficiently the secret key with any or all of the 
information intercepted in the open communication channel. Once the secret key 
has been exchanged successfully between the two parties, they may proceed by 
using the key in their symmetric crypto system. 


Before conducting the key exchange using the Diffie-Hellman protocol, the 
parties must agree on a prime number that defines the mathematical 
environment in which the key exchange will take place. If the prime number is 
large enough, a brute force attack to find the secret key becomes infeasible. 
However, if the two parties agree on certain prime numbers, an active adversary 
can compromise their communication. Using number theory, a man-in-the- 
middle attack becomes possible if the prime number that defines the environment 
can be broken down into the form of p= Rq+1, where R is a “small” integer and 
q is a “large” prime. If possible, the attacker can then modify the messages 
between the two parties so that they will both derive a key that belongs to a 
subgroup of size R. If R is small enough, the attacker can search the keyspace 
in a reasonable amount of time, determine the key the parties agreed to, and 


eavesdrop on their communication. 


This thesis investigates the Diffie-Hellman protocol and the difficulty of the 
discrete logarithm problem the protocol relies on. We then analyze the man-in- 
middle attack described above by developing an algorithm to conduct the attack, 
estimate the complexity involved in executing the attack, and approximate the 
amount of prime numbers that are vulnerable. We then consider several 
proposed methods to defend against the attack and demonstrate their 
effectiveness. Finally, we extend the attack to several multi-party variants of the 


protocol and demonstrate their potential vulnerability. 


THIS PAGE INTENTIONALLY LEFT BLANK 


ll. . BACKGROUND AND REVIEW 


Before beginning a discussion of the Diffie-Hellman protocol and the man- 
in-the-middle attack, we investigate and present some basic definitions and 
theorems. This information is available in any standard algebra text, such as 
Fraleigh’s Abstract Algebra [4], or discrete mathematics text, such as Rosen’s 
Discrete Mathematics and Its Applications [5]. It is assumed the reader is 


familiar with common mathematical, logical, and set notation. 


We conclude the chapter with a brief discussion of computational 
complexity and primality testing, which will be useful in our analysis of the man- 


in-the-middle attack. 


A. NUMBER THEORY 


If a and b are integers and a#0, we say that a divides b if there is an 
integer c such that b=ac. When a divides b we say that a is a factor of b and 


that b is a multiple of a. The notation a\b denotes a divides b. Given two 
integers a and b, both non-zero, the largest integer d such that d|a and db is 


called the greatest common divisor of a and b. The greatest common divisor of 
a and b is denoted by gcd(a,b). The integers a and b are relatively prime, if 


their greatest common divisor is one. 


Every positive integer greater than one is divisible by at least two integers, 
itself and one. If these are its only factors, we call this integer prime. A positive 
integer that is greater than one, and not prime, is called composite. The primes 
are the building blocks of positive integers. The Fundamental Theorem of 
Arithmetic states that every positive integer greater than one can be written 
uniquely as a product of two of more primes, where the prime factors are written 
in order of nondecreasing size. Given a positive integer, n, let the prime 


factorization of n be denoted by 


k 

i=l 
In some situations, we care only about the remainder of an integer when it 
is divided by some specified positive integer, denoted by m. If a and b are 
integers, then a is congruent to b modulo m if m divides a—b. We use the 


notation a=b (modm) to indicate that a is congruent to b modulo m. Note that 
a=b (modm) if and only if a(modm)=b(modm). Also, if n divides a then a is 
congruent to zero modulo n. 
The great French mathematician Pierre de Fermat (1601-1655) 
demonstrated that the congruence 
a’'=1 (mod p) 
holds when p is a prime, and this gives us a theorem that will prove crucial in 


our analysis of the man-in-the-middle attack. 
Fermat’s Theorem [4]: If aeZ and p is a prime not dividing a, then p 


divides a’"'—1, that is, a’ '=1 (mod p). 


Euler gave a generalization of Fermat’s theorem, but we must first define 
Euler’s Totient Function. Commonly referred to as Euler’s Phi Function, the 
function gives the number of integers less than or equal to 1 which are relatively 


k 
prime to n, and is denoted by ¢g(n). It is not hard to show that, if n =|] % , then 


i=] 
s 1 
g(n) =n] [| 1-— 
i=l i 
Euler’s Theorem [4]: If ae Z and is relatively prime to n, then a®” -1 is 


divisible by n, that is, a®” =1 (modn). 


In several cases, this thesis will involve systems of linear congruences. 
The Chinese Remainder Theorem [CRT], named after the Chinese heritage of 


problems involving systems of linear congruences, states that when the moduli of 


a system of linear congruences are pairwise relatively prime, there is a unique 


solution of the system modulo the product of the moduli. 
[CRT] [5]: Let m,m,,...,.m, be pairwise relatively prime positive integers and 


,,d,,....a, arbitrary integers. Then the system 


x =a,(mod™m, ), 


x =a,(modm,), 


x =a,(modm, ) 


has a unique solution modulo m=mm,...m,. (That is, there is a solution x with 


0<x<m, and all other solutions are congruent modulo m to this solution.) 


B. GROUP THEORY 
A group (G,*) is a set G, closed under a binary operation *, such that 
the following axioms are satisfied: 
Associativity: For all a,b,ceG, (a*b)*c=a*(b*c) 
Identity: There is an element e in G_ such that for all xeG, 
e*x*x=xX*EC=X. 
Inverse: Corresponding to each aéG, there is an element a' in G such 
that a*a'=a'*a=e. 
A group that also satisfies the commutative property is referred to as an abelian 
(or commutative) group. 


Commutativity: For all a.beG, a*b=b*a. 


A group G is said to be a finite group, if the set G has a finite number of 
elements. In this case, the number of elements is called the order of G, 


denoted by |G. This thesis is interested only in finite groups. 
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If a subset H of a group G is closed under the binary operation of G and 
if H with the induced operation from G is itself a group, then H is a subgroup of 
G. We shall let H<G or G2>H mean that H is a subgroup of G, and H<G 
or G>H shallmean H<G but H#G. 

An example of a group is the set of congruence classes of the integers 
modulo n. Given a positive integer n, we denote a congruence class by |a|, 
which is the set of all integers congruent to a modulo n. The set of congruence 


classes of n is denoted by 

Z, = {[0], [1], --[»—2], [2-1], } 
This set forms a group under addition where [a] +[b] =[a+b] and is denoted 
by (Z,,+). We can easily inspect a group using a group table. Table 1 is a 


group table for Z, under addition. The elements of Z, are the column and row 


headings,, with the binary operation (addition in this case), in the upper left 


corner. 




















2 3 4 
0 0 1 2 3 4 
1 1 2 3 4 0 
2 2 3 4 0 1 
3 3 4 0 1 2 
4 4 0 1 2 3 























Table 1. Group Table for (Z,,+) 


If n is a prime p, then the set Z, =Z, -{(0],} forms a group under 


multiplication modulo n. It is a necessary requirement to remove the zero class 
because zero has no inverse under multiplication. (Z",-), the multiplicative 


group of the set of congruence classes of prime integers, is the structure we will 


8 


be focusing on in this thesis. The Diffie-Hellman key exchange protocol sets this 
group as the environment for the key agreement. If we remove the zero element 
from the previous example, we have another group table (Table 2), this time with 


multiplication as the binary operation. 














2 4 

1 1 2 3 4 
4 3 

2 





BR 10 |M 
BR 10 |NM 
= 
iN 

















3 2 1 
Table 2. Group Table for (Z.-) 





Let G be a group and let aeG. Then the subgroup {a” 





neZ}\ of G is 
called the cyclic subgroup of G generated by a, and is denoted by (a). Further, 
a generates G if (a)=G. A group G is cyclic if there is some element a in G 
that generates G. 

The group (Z;,,-) is always cyclic. An important property of cyclic groups 


is that every subgroup of a cyclic group is also cyclic. Another important property 
of groups in general is the Theorem of Lagrange. 


Lagrange’s Theorem [4]: Let H be a subgroup of a finite group G. 


Then the order of H is a divisor of the order of G. 

This powerful theorem makes the attack we will analyze later possible. 
We know the order of (Z;,,-) is p—-1. The two properties mentioned above tell 
us that any subgroup of (Z’,,-) will also be cyclic and the order of the subgroup 


will be a divisor of p—1. 


C. FIELD THEORY 


A field (F,+,-), is a set F together with two binary operations, which we 
will call addition and multiplication, defined on F such that the following axioms 


are satisfied: 

Addition: (F,+) is an abelian group. 

Multiplication: (F*,-) is an abelian group. 

Distributive: For all a,b,ce F, a-(b+c) =(a-b)+(a-c). 
A field F is said to be a finite field, if the set F has a finite number of elements. 
If F is a finite field, then the multiplicative group is cyclic. 

For every prime p and positive integer n, there is exactly one finite field 
(up to isomorphism) of order p”. This field GF(p") is usually referred to as the 
Galois field of order p". Oftentimes, the Diffie-Hellman key exchange protocol 
is described using the environment GF(p) instead of the group Zo: In the group 


theory section, we described the notion of a generator of a cyclic group. In field 
theory, specifically in GF(p), the same element that will generate the entire 
multiplicative group is known as a primitive root. The number of primitive roots of 
a field GF(p) is d(¢(p))=(p-). 


D. COMPUTATIONAL COMPLEXITY 


Before the discussion of primality testing, it is important to understand 
what makes one test more efficient than another. Computational complexity 
involves the study of the efficiency of algorithms based on the time and memory 
space required to solve a problem of a particular size [5]. Usually, complexities 
are expressed using the Big-O notation. 
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Definition [5]: Let f and g be functions from the set of integers or the 
set of real numbers to the set of real numbers. We say that f(x) is O(g(x)) if 
there are constants C and k such that 

If] < Clg (| 
Whenever x>k. [This is read as “ f(x) is big-oh of g(x) .”] 


This notation is extremely helpful wnen comparing algorithms, such as the 
primality tests we will discuss. We will use the Big-O notation as an upper bound 
on the amount of operations a test will require. In general, the smaller the upper 
bound, the more efficient the test is. The more efficient the test is, the quicker it 
can complete the required steps of an algorithm and give an answer. Thus, 
using the Big-O notation, we can often quickly decide which test will finish 


soonest, using fewer resources and less computer time. 
The most commonly used functions in Big-O notation are: 
Llogn,n,nlogn,n’,2”,n! 


It is shown that each function in the list is smaller than the succeeding 


function as n grows without bound [5]. Figure 1 demonstrates this fact. 
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2048 
1024 
512 - 
256 
128 
na” 
64 
32 
1 log 
16 niogn 
n 
8 
4 log n 
2 
| 
1 
? 3 4 5 6 7 8 
Figure 1. Growth of Functions Used in Big-O Estimates [From 5] 


Notice the vertical axis scale is logarithmic, doubling each unit. This 
causes the exponential function 2” to appear as a straight line. 

An algorithm that is Big-O of a constant has constant complexity. An 
algorithm that is Big-O of a logarithm has logarithmic complexity, and so on. 


Table 3 displays the common terminology used to describe the time complexity 
of algorithm. 
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Complexity Terminology 

O(l) Constant complexity 
O(log n) Logarithmic complexity 
O(n) Linear complexity 
O(n’) Polynomial complexity 
O(b") Exponential complexity 
O(n!) Factorial complexity 











Table 3. | Computational Complexity Terminology [From 5] 


The algorithms we will be concerned with are of polynomial and 
exponential complexity. The difference between the two can be enormous. 
Polynomial or better complexities are called tractable, because it is assumed 
that given a reasonably-sized input, the algorithm will produce an answer in a 
reasonable amount of time. On the other hand, exponential complexities or 
worse are called intractable. This is because an extremely large amount of time 
is usually required to run the algorithm. However, a polynomial complexity 
algorithm with a very high degree might take longer to run than an exponential 
complexity algorithm with a small base. 


E. PRIMALITY TESTING 


We now turn to a topic of critical importance in our analysis of the man-in- 
the-middle attack. Suppose a large integer is given. How might we quickly be 
able to tell if the number is prime or composite? Mathematicians have studied 
this question for millennia, and recently this question has become even more 
important as modern computing power has granted the ability to test theories on 
a scale that was at one point inconceivable. A primality test is an algorithm for 
determining whether an input number is prime. Primality tests can be divided 
into two main groups: deterministic and probabilistic. Deterministic primality 
tests prove with certainty whether a number is prime or composite. Probabilistic 


primality tests tell us a number is composite or probably prime. If a probabilistic 
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method returns the number is composite, the number is definitely composite. 
However, if it returns the number as prime, there is a controllably small chance 


the number is actually composite [6]. 


Primality testing is currently a topic of great interest and research and is, 
therefore, very dynamic. We provide descriptions of several deterministic and 
probabilistic algorithms as background for the reader. It is by no means a 
comprehensive discussion of every algorithm available. Rather, we use this 
section as a way to motivate our choice of a primality test for later on when we 


will need to quickly determine if a given number is prime. 


1. Deterministic Primality Tests 
a. Trial Division 


The simplest primality test is trial division. Trial division is the 
method of sequentially trying test divisors into a number n so as to partially or 
completely factor n [6]. We start with the first prime number, 2, and try to divide 
n by 2. If 2 divides n, we know n is composite and can stop. If 2 does not 
divide n, we try the next prime number, 3. If 3 divides n, we stop. If not, we try 
the next prime, and so on. When we reach a trial divisor that is greater than the 
square root of n, we may stop. If no prime up to the square root of n divides n, 


then we declare n a prime. 


This test is quite computationally intensive. Let z(t) be the prime 


counting function, which counts the number of primes <r. Trial division 


requires (in the worst case) about x(VJn)= i divisions, if the primes to Vn are 
nn 





stored in a database, or even st divisions, if the primes are not stored before 


the test starts. 
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b. The n-1 Test 


Trial division can be used to test small numbers for primality, but for 
larger numbers there are better methods [6]. The n-—1 test is based on Fermat's 
little theorem, and suggests that we try to factor n—1, not n. In 1876, E. Lucas 
turned Fermat's little theorem into a primality test. 

Lucas’ Theorem [6]: If a,n are integers with n>1, and a”'=1 (modn), 
but a" is not congruent to 1, modulo n for every prime gin-1, then n is 
prime. 

The most difficult step in implementing the Lucas test is finding the 


complete factorization of n—1. Pocklington strengthened the result by realizing a 


partial factorization would suffice [6]. In particular, say 
n—1=FR, and the complete factorization of F is Known. (1) 
Pocklington’s Theorem: Suppose (1) holds and a”'=1 (modn) and 


gcd(a” 4 —1,n)=1 for each prime glF. Then every prime factor of n is 


congruent to 1 (mod F). (2) 


Corollary (n-1 test): \f (1) and (2) hold and F > Jn, then n is prime. 


Several results have allowed a smaller value of F. These include 
work done by Brillhart, Lehmer, Selfridge, Konyagin, and Pomerance [6]. 
The Lucas test and variations of it have a running time of about 


O((ogn)’). The question of finding the “right” base still remains. 


Cc. Elliptic Curve Primality Proving 


Elliptic Curve Primality Proving (ECPP) is a class of algorithms that 
provide certificates of primality using sophisticated results from the theory of 
elliptic curves. A detailed description of the background, theory, and 
implementation of the ECPP can be found in Atkin and Morain [7]. 


1S 


ECPP is the fastest known general-purpose _primality-testing 


algorithm. ECPP has a running time of O((log n)*) [7]. 


d. The AKS Test 


In August 2002, the Agrawal-Kayal-Saxena (AKS) primality test 
was published in a paper titled “Primes is in P” [8]. The result was highly 


celebrated because of the four properties the test satisfies: 
1) It can be used to verify the primality of any given number. 
2) The maximum running time is polynomial. 
3) The algorithm is deterministic, not probabilistic 
4) The algorithm is not conditional on an unproven hypothesis. 


There are other algorithms that satisfy three of the four properties, 
but AKS is the only known test to satisfy all four. 
The test is based upon the equivalence 
(x-a)" =(x"-a) (modn) 
for a coprime to n, which is true if and only if n is prime. This is a generalization 
of Fermat’s Little Theorem and constitutes a primality test by itself. However, the 


verification of primality would take exponential time, and thus, requires 


improvement. The AKS test makes use of a related equivalence 


(x-a)" =(x"-a) (modn,x’ -1). 


This equivalence can be checked in polynomial time, with the complexity of the 


original algorithm being O((log ny’). However, recently the complexity has been 


brought down to O((logn)°) [9]. 
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2. Probabilistic Primality Tests 
a. Fermat Primality Test 


Based on Fermat’s Little Theorem, the Fermat Primality Test is a 
probabilistic primality test that is the basis for the Miller-Rabin primality test used 
later on in the thesis. 

Recall that by Fermat’s Little Theorem, if p is prime and p does 
not divide a, then a’'=1 (modp). If we want to test if a given integer n is 
prime, we compute a”' (modn) for several values of a. If the result is not 1 for 


some value of a, then n is composite. If the result is 1 for many values of a, 


then we can say that 7 is probably prime. 


The reason we can only say probably is because the congruence 
a" '=1 (modn) may hold when n is composite. A composite number n is a 
(Fermat) pseudoprime, if the congruence a”'=1 (modn) holds [6]. 
Unfortunately, for the Fermat Primality Test, there are infinitely many numbers 
that the test would call probably prime even if every value of a was computed [6]. 
These numbers are the so-called Carmichael numbers and give us reason to 
look for a test that will only give pseudoprimes for a fixed fraction of the bases 
attempted. The Miller-Rabin test accomplishes this goal. 


b. Miller-Rabin Primality Test 


The Miller-Rabin Primality Test is an efficient probabilistic algorithm 
to test for primality based on the idea of strong pseudoprimes. Consider an odd 
composite number n and n—1=d-2* with d odd. n is a strong pseudoprime if 
either a“ =1 (modn) or a‘? =-1 (modn) with r=0,1,....-1. The Carmichael 
numbers are Fermat pseudoprimes for every base. However, a composite 


number can only be a strong pseudoprime to at most one quarter of all bases [6]. 
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The algorithm is as follows: 


Choose a random integer ae[2,n—-2]. If a’ #1 (modn) and 


a’? #-1 (modn) for all r=0,1,...s-1, then a is called a witness and n is 


composite. Otherwise, n is a strong probable prime to base a. 

If n>9 and is odd composite, the probability that the algorithm will 
fail to produce a witness for 1 is <1/4. The probability that we fail to find a 
witness after k iterations is <1/4* [6]. We can make this probability as small as 
we desire with a large number of iterations. For instance, if we wanted to ensure 
the probability of calling a composite number a prime is less than 10°, we must 


compute 10 iterations or more. 


As an example, suppose we wanted to determine if the number 341 
is prime. First we write 341-1=340=2°-85. So s=2 and d=85. We randomly 


select a=38 and proceed with: 

a“ modn = 38” mod 341=56#1 

a’! modn =38* mod341=564n-1 
a‘ modn = 38'” mod 341 =67 #n-1. 


Since none of the congruences hold, we know 341 is composite. In 
fact, 341=11-31. However, consider n=703 and a=3. 703-1=702=2'-351. 
So s=1 and d =351. Continuing: 


a‘ modn =3* mod703 = 702 #1 
a’ ‘ modn = 3*' mod 703 = 702 =n—1 


By the second congruence, 703 is a strong pseudoprime base3. If 


we then try a=5, we get: 
a“ modn =5*' mod 703 = 438 #1 


a’ modn=5*! mod 703 = 438 #n-1. 
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This time neither congruence holds, and we know 703 is a 


composite number. In fact, 703 =19-37. 
The Miller-Rabin test is very fast and has a complexity of 


O((logn)’ — ee ; 
(( eee ). Of course, because it is probabilistic, there is a chance of the test 


returning a number as prime when it is in fact composite. However, as will be 
demonstrated later, we are very concerned with the speed of the primality test 
and no deterministic test will run fast enough for our purpose. The Miller-Rabin 
test offers us both speed, as compared to other primality tests, and the ability to 


control the probability of error and will be our tool of choice. 
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lll. DIFFIE-HELLMAN AND THE DISCRETE LOGARITHM 


A. THE DIFFIE-HELLMAN PROTOCOL 


“We stand today on the brink of a revolution in cryptography.” This was 
the first sentence in a breakthrough paper published in 1977 by Whitfield Diffie 
and Martin E. Hellman. In the paper, titled New Directions in Cryptography [10], 
the authors introduced the idea of public key cryptography and a key exchange 
protocol that was named in their honor. The Diffie-Hellman protocol provided the 
first practical solution to the key distribution problem, allowing two parties, never 
having met in advance or shared keying material, to establish a shared secret by 
exchanging messages over an open channel. The key can then be used to 
encrypt subsequent communications using a symmetric key cipher. The security 
rests on the intractability of the Diffie-Hellman problem and the related problem of 
computing discrete logarithms [1]. We will call the two parties conducting the key 
exchange “Alice” and “Bob.” 


Protocol steps: 


A A prime number p and generator a of Z(2<a<p-2) are 


selected and published. 


2 Alice chooses a random secret x,l1<x<p-—2,and sends Bob 
a* mod p 
A—> B:a* mod p 

3. Bob chooses a random secret y,l< y< p—2, and sends Alice 
a’ mod p 
B+ A:a’ mod p 

4. Bob receives a* and computes the shared key as K =(a*)” mod p 


5. Alice receives a’ and computes the shared key as K =(a”’)* mod p 
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Because (a’)* =(a’)’, Alice and Bob have arrived at the same secret 
key. Only x, y, and a” are kept secret. All other values are sent in the 
clear. The example below illustrates the procedure. 

1. Alice and Bob agree on p=37 and a=2. 

2 Alice chooses x =14 and sends Bob 30(= 2" mod 37). 
A—> B:30 

3 Bob chooses y= 23 and sends Alice 5(=2” mod37). 


BoA:5 
4. Bob receives 30 and computes 30” mod 37 = 28 
5 Alice receives 5 and computes 5'* mod37 = 28 


Alice and Bob have agreed upon 28 as their secret key. 


Figure 2 demonstrates which parties know what information. The man-in- 


the-middle will be called Eve from here on out. 





Figure 2. Diffie-Hellman Example 
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Obviously, a much larger value of p is required than used in the example 
to make the key agreement potentially secure. If the prime number 37 was 
used, Eve could simply try all possible values of 2°” mod37. Because 2 is a 
primitive root modulo 37, this can take 36 values. A key space with only 36 
possibilities can be exhausted with ease. However, if the prime number used is 
large enough, no computing power available today can exhaust the key space. 
For instance, most applications recommend 1024-bit primes [2]. This correlates 
to a number of about 300 digits and makes searching the key space one by one 
infeasible. Table 4 demonstrates how long it would take a modern personal 
computer (PC) and a super-computer (SC) to exhaust various sizes of key 
spaces. We assume a PC can search approximately one million (10°) keys per 
second, while a super-computer can search approximately one trillion (10'*) keys 


per second. 


For instance, if a prime of 64 bits was used, it would correlate to a base- 
ten number of approximately 19 digits. The key space would be all the numbers 


1,2,..., p—1, which would be on the order of 10'° numbers. Therefore, a PC would 






































take a = 10" seconds to completely search the entire key space. 
Bits Digits (approximate) PC time SC time 

(approximate) (approximate) 

64 19 317,098 years 115 days 

128 39 3 x 104(25) years 3 x 104(19) years 

256 77 3 x 104(63) years 3 x 104(57) years 

512 154 3 x 10(140) years 3 x 104(134) years 

1024 308 3 x 104(294) years 3 x 104(288) years 

2048 616 3 x 10*(602) years 3 x 10*(596) years 

Table 4. | Times to Exhaust a Key Space 
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Considering most applications use prime of 1024 bits or greater, it is 
obviously infeasible to conduct a random search of an entire key space. Of 
course, one could get lucky and the key could be one of the first numbers 
searched by the computer. However, as indicated by the enormous times listed 
in the table, it is more likely a random key search would take longer than most 


scientists believe the universe has existed. 


B. THE DISCRETE LOGARITHM 


Eve has more information than just the fact that the key resides in the 


interval (1,p-—1). Because the exchange occurs over an open channel, Eve 
knows a* and a’ as well. If 6=a*(modp) and y=a’(modp), then p,a,f, 
and y are known. All Eve has to do is solve a@*=f(modp) for x or 
a> =y(mod p) for y. Once x or y are known, Eve simply raises a* to y or a’ 
to x and arrives at the secret key K. However, if p is large, solving 


a” = B(mod p) for x in general is considered difficult. The problem of finding x 
in this case is known as the discrete logarithm problem (DLP), often 
abbreviated x=L,(f). 


The difficulty of solving the DLP yields useful cryptosystems. _Diffie- 
Hellman key exchange protocol, El Gamal encryption system, and the Digital 
Signature Algorithm all rely on the difficulty of solving the DLP. However, not all 
public-key crypto systems rely on the difficulty of the DLP. Another number 
theory problem that yields cryptosystems is the problem of factoring large 
integers. RSA, considered by many to be the most popular public-key 
cryptography algorithm, relies on the difficulty of factorization for its security. The 
size of the largest primes for which discrete logs can be computed has usually 
been approximately the same size as the size of largest integers that could be 
factored [11]. In 2005, a 168 digit prime (556 bits) discrete logarithm was 
computed, setting a record at that time. The record factorization up to then was 
200 digits (663 bits). 
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As discussed above, if p is small, it is easy to compute discrete logs by 
exhaustive search. However, when is p large, this is not feasible. We will now 


discuss several methods of attacking the DLP. 


1. The Pohlig-Hellman Algorithm 
Pohlig and Hellman introduced the following algorithm in 1978 to solve 


discrete logs when p-1 has only small prime factors [11], [12]. 


Suppose 


p-1=]| Ja; 


is the factorization of p—1 into primes. Let g’ be one of the factors. The idea is 
to compute x (modq’) for each g;' and combine them using the Chinese 


Remainder Theorem to find the discrete logarithm. 


Thus, x (modgq’) is found by writing x=x,+xq+x,q°+... with 0<x,<q-1 and 


determining the coefficients x,,x,,...,x 


r-1* 


General idea: Starting with G =a", raise both sides to the et to obtain 
q 


(P-DIG = gy XP-DIa = gol Ca a =o" 1 (mod p) 
To find x,, simply look at the powers 
ak?" (mod p), k =0,1,2,...g—1, 
until one of them yields 6’. Then x, =k. 
An extension of this idea yields the remaining coefficients. Assume g*|p—1. Let 


= a = (trait) (mod p) 
1 


p-l 


Raise both sides to the —, 





power to obtain 


poeveg = yd as aa = QP DIa (gpl yatndt- = gel (mod P) ; 
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To find x,, simply look at the powers 
ak? 2 (mod p), k=0,1,2,...¢—1, 
Until one of them yields 8°". Then x, =k. 


p-l 


3 


If gl p-l, let B,=f,a™, and raise both sides to the 





power and find x,. 


We can continue this process until we find that g’*' does not divide p-—1. We 


have then determined x,,x,...x,,, SO we know x (modq’). 


Repeat this procedure for all prime factors of p—1. This yields x (modq’) 
for each q' and we combine these using the Chinese Remainder Theorem to 


find x (mod p-1). Since 0<x< p-—1, this determines x. 


As an example, let us solve 2* =3 (mod101) for x. 
p—1=100=2’-5* so qg=2,5 

First, we solve 2*=3 (mod2’). Let x=x,+2x, (mod2*). Then 

B'??? =3° =-1 (mod101) and a??? =2” =-1 (mod101) 
So -1=(-1)” and x, =1. 
Continuing, 2, = Ba =3-27' =3-51=52 (mod101). So Bi” =527 =1 (mod 101) 
and 1=(-1)". So x,=0 and x=1+2-0=1 (mod2’). 
Next, we solve 2*=3 (mod5’). Let x=x,+5x, (mod5*). Then 

B® =3” =84 (mod101) and a'?"? =2” =95 (mod101) 
We make alist, 

95° =1;95' =95;95° = 36;95° =87;95' =84 (mod101). 
Matching with the list, we see that x, =4. 
Continuing, we get 2, = Ba” =3-2% =3-19=57. So BY =57' =87 (mod101). 
We again compare with the above list and see that 95° =3 and x, =3. This leads 
to x=44+5-3=19 (mod5’). 
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Now, we combine x=1 (mod2’) and x=19 (mod5’) using the Chinese 


Remainder Theorem to find x=69. So 2° =3 (mod101). 


It is well known that the time complexity of the Pohlig-Hellman algorithm is 


Op) [11]. 


2. Baby Step, Giant Step 

Eve is trying to solve a =f(mod p) for x. The following algorithm was 
developed by Daniel Shanks [11]. 

First, choose an integer N with N’ > p—1. Next, make two lists: 


1. a’ mod p for 0< j<N 


2. Ba™ mod p for 0<k<N 
Nk 


Look for a match between the two lists. If one is found, then a’ =fa™, 


J+Nk 


so a’ =f. Therefore, x= j+ Nk and the discrete logarithm is solved. 


The complexity of the baby step, giant step algorithm is also OWlp) , but it 
requires storing approximately Jp numbers in memory and is therefore, 


impractical for very large primes, such as 10” or larger [11]. 


3. The Index Calculus 
Again, Eve is trying to solve a* = f(mod p) for x. The idea in the index 
calculus method is similar to the quadratic sieve method of factoring [11]. 


The first step is a precomputation step and involves picking a factor base 
and searching for a set of r linearly independent relations between the factor 


base and the powers of a. Let B be abound and let p,, p,....,p,, be the primes 


less than B. This is our factor base. We then compute a“ (mod p) for r values 
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of k. For each number, try to write it as a product of the factor base. If this is not 


the case, discard a‘. However, if a“ =| |p" (mod p), then 


k=>'aL,(p,) (mod p-1). 


When we obtain enough relations, we can solve for L,(p,) for each 7. 


Next, for random integers s, compute fa* (modp). For each such 
number, try to write it as a product of primes less than B. If we succeed, we 


have Ba’ =|] p;' (mod p) , which means 
L,(B)=-s+ > bL,(p,) (mod p-1). 


Using this algorithm, any p over 200 digits will be difficult to solve, which 


makes the Index Calculus good only for moderate-sized primes [11]. One can 


1/3 


show that the time complexity of the Index Calculus is O(e“"™”” ““""™””) for some 


c >0, if implemented by the Number Field Sieve. 


C. THE DIFFIE-HELLMAN PROBLEM 


We described how solving the discrete logarithm easily would allow Eve to 
arrive at the secret key. There is another problem Eve can solve to arrive at the 
secret key—namely, the Diffie-Hellman Problem. The Diffie-Hellman Problem 
comes in two flavors, the computational and the decisional. The Computational 
Diffie-Hellman Problem is defined as follows: Let p be a prime and let a bea 


primitive root mod p. Given a@*(mod p) and a@’(mod p), find a’ (mod p). Recall 


that Eve has access to both a* and a” as they are both made public during the 
exchange. It is not currently known whether or not this problem is easier than 
computing discrete logs [11]. A related problem, known as the Decisional 
Diffie-Hellman Problem, is defined as follows: Let p be a prime and let a bea 
primitive root modp. Given a‘(modp) and a’(modp), and £#0 (modp), 
decide whether or not K =a” (mod p)[11]. In other words, if someone offers a 
number to Eve and claims it is K , can Eve decide whether or not that person is 
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telling the truth with the information captured in the open channel? Like the 
computational Diffie-Hellman problem, the decisional Diffie-Hellman problem has 
yet to be solved. It is unknown whether a method for solving the decisional 


problem will lead to a solution for the computational problem. 


The methods described for solving discrete logarithms above force 
applications that rely on the difficultly of solving discrete logs to stay away from 
certain primes. Obviously, the larger the prime used, the better. Baby-step 
Giant-step and the Index Calculus become infeasible to use when primes are 
larger than 200 digits. The Pohlig-Hellman algorithm relies on the factorization of 
p-1 to consist of only small primes. If p does not contain only small primes, 
the algorithm becomes inefficient. Therefore, the primes chosen when using the 
Diffie-Hellman protocol should contain at least one large prime in the factorization 
of p—1. This situation gives rise to the attack we will focus on. If p—1 contains 
a very large prime, such that p—1=Rq with q prime and R a small integer, an 


unauthenticated exchange becomes vulnerable to an active man-in-the-middle 


attack that we will discuss next. 


29 


THIS PAGE INTENTIONALLY LEFT BLANK 


30 


IV. MAN-IN-THE-MIDDLE ATTACK 


A. THEORY BEHIND THE ATTACK 


Wiener and van Oorschot [2] noted that, if certain primes are used, a 
potentially fatal protocol attack on the Diffie-Hellman key exchange protocol 
becomes possible. The idea is based on forcing the parties to agree on a shared 


key that resides in a subgroup of the cyclic group Z.,- If the order of the 


subgroup is small enough, an adversary can exhaustively search the subgroup, 
retrieve the secret key, and eavesdrop on the communication of Alice and Bob. 


For instance, consider the case when the prime used for the key 
exchange is of the form p=2q+1, where gis aprime. Then, a’ =a’. 
Claim: a‘’”” is an element of order two. 


Proof: By Fermat's little theorem, a@’'=1 mod p. So a’ must be +1 or 
-1. But if a’ =1 then @ must have order (p-—1)/2. This is a contradiction, 
because a is a primitive root of Z, and must be of order p-1. So a” =-1 
and is an element of order two. [| 

If Alice and Bob respectively send each other unauthenticated messages 
a” and a’, an active intruder may substitute (@*)‘’ for the first, and (@*)‘ for the 
second. When Alice receives (a@’)’ and computes (a‘“’)* and when Bob 


receives (a*)‘ and computes (a‘*)”, they will arrive at only one of two possible 


values, +1 and -1. The intruder can then try both possible keys and gain access 
to Alice and Bob’s secret communications. Obviously, if Alice and Bob 
demonstrate vigilance, they will agree in advance to suspect any key agreement 


that arrives at +1 or -1. 
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We can generalize the situation if Alice and Bob use a prime number of 


the form p=Rq+1, where R is a small integer and g is again a large prime. 


(p-l)/ 


: R, 
Claim: Q@ is an element of order R. 


ot -1)/R 
Proof. Raising @ a to consecutive powers, starting with 0, we get: 


(Qe)? = 1 Cpe (ern (ap? IRYR = ge = 1 


poseeny 


This produces a list of R different values. Continuing after R, 


—1D/R \ (R41) -l/R\R -l)/Ry1 —-1)/R 
(a? =(a” ) ) (a? ) ) =1-(a” ) ) 


3 





—1)/R \(R+2 -1)/R\R —D/R\2 —1l/R\2 
(Cone ) =(a” ) ) (a? ) ) =1-(a ) ) 


posse eeee ; 





(p-l)/R\(Rtn) _ ¢_,(p-l)/R\R (p-l)/R\n _ (p-l)/R 
(a do ya Shae ye 
For n<R, the results are in the original list. 
For n>R,wecan write R+n=R+kR+m with O<m<R-1 and mkeZ. 


(ap P DIR RH DR = (oP IRYR (q¢PVIRYR (QP VIRyn = 


=(a 


= [<]* (Qe hy" = (Qe vey 


)/R 


desist het thee ts “DIR , 
Because 0<m< R-1, this is in our original list and had isoforderR. [| 


So, if the prime Alice and Bob agree to use is of the form p=Rq+1, Eve 
can force them to agree on a key in a subgroup of Z., of order R by replacing 


a” and a” with (@*)’ and (@’)*. Even if Alice and Bob are vigilant, the key can 
take any of R values and the generalized attack poses a significant threat to an 


unauthenticated key exchange using the Diffie-Hellman protocol. 
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B. CREATING THE ENVIRONMENT 


Eve must force Alice and Bob into a subgroup of small order to conduct 


this attack. Figure 3 represents a possible algorithm Eve could follow. 


NOTE: Eve only needs to consider cases when R is even, because if R is odd, 

p-l : ' p-l 

ee must be even and cannot be prime. Also, if Eve calculates ——-_,meZ as 
m 


, —1 
a non-integer, she can obviously ignore trying any number of the form oa eZ 
m 


because it will also not be an integer. 
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Figure 3. Attack Algorithm 


The most important step in creating the environment to conduct the attack 

p-l 
is searching ok : 2,4 yy R until we find a prime. We cannot continue the 
attack until we find such a prime. Obviously, the longer Alice and Bob are kept 
waiting for return correspondence, the more suspicious they will become of 


possible compromise of their communication. Therefore, we need the fastest 
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possible method to detect primality. From our discussion in Chapter Il, we know 
a probabilistic primality test suits us best. Specifically, we could use the Miller- 


Rabin primality test with complexity O((logn)’). 


If we are forced to search the entire index k from 2<k<R, how long 
might this take us? Recall that we only need try even values of k, and in the 
worst case, we may be forced to try all R/2 even numbers. Therefore, the worst 
case scenario in searching for a prime would take 

R 
>) 0(dog N)’) 


k=2 ; =0((R/2)-(log NY’) = O((log Ny’) 





steps, with N being the input number into the Miller-Rabin primality test. Thus 
the constant value in the Big-O estimate changes, but the algorithm remains 


bounded by the time it takes to conduct the primality tests. 


As an example, suppose Eve was listening to Alice and Bob agree upon 
the prime number to use for their key exchange to take place in the near future. 
The prime number they choose is p=10007 with a primitive root of ~=3. Eve 
uses the attack algorithm in Figure 4 to attempt to force Alice and Bob to agree to 


a key in a subgroup of Zoo, - 


= 5003 


First, = —— 


Next, Eve runs 5003 through the Miller-Rabin primality test and the result 
is prime. 

This situation represents the initial case described above with the prime 
number being of the form p=2q+1. Specifically, 10007 =2-5003+1. Next, Eve 
must intercept the number Alice attempts to send to Bob. Suppose Alice 
chooses x=758 and attempts to send a*(3’* mod 10007 = 4865) to Bob. 

A— E: 4865 
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Eve intercepts the communication, then takes a*(4865) and raises it to the 
q power. 
(a*)! =")? mod 10007 
Meanwhile, Eve must also intercept the number Bob is attempting to send 
to Alice. Suppose Bob chooses y=555 and attempts to send 
a (3° mod 10007 =1771) to Alice. 
Bo E:1771 
Eve again intercepts the communication, and takes @’(1771) and raises it 
to the gq power. 
(a’)! =(3°°) mod 10007 
Eve then sends the results to the intended recipients. 
E — B: 4865‘ mod 10007 
E + A:1771* mod 10007 


Alice and Bob then both finish the key agreement by raising the received 
number to their private keys, x and y respectively, and arrive at the same 
number, the “secret” key. 

(a) =(a"t)’ 

As a result of the theory discussed above, without any knowledge of x or 

y , Eve knows the only possible keys are 1 and 10006. Eve must wait for a 


message to be sent between Alice and Bob, try both keys, and figure out which 
one is being used. She can then eavesdrop, and Alice and Bob’s secret 


communication has been compromised. 


However, as mentioned before, any vigilance on the part of Alice or Bob 


would cause suspicion if the key agreed upon were of the form +1 or -1. 
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Now, suppose the prime number Alice and Bob agreed upon was 
p =19991 and a=3. Eve must again search for a large prime factor of p-1. 


First 0 5995 
3 3 


Next, Eve would run 9995 through the Miller-Rabin primality test. 
However, because it ends with a five, five must be a factor and it cannot be a 


prime number. 


Continuing, Po = — = 4997.5 is not an integer. 
pe = == = 3333.66 is not an integer. 
Because po was not an integer, we skip es 
pak _ 1999 _ 1999 

10 10 


Next, Eve runs 1999 through the Miller-Rabin primality test and the result 
is prime. 

Eve has found a large prime factor of p—1. This situation resembles the 
generalized attack with a prime of the form p=Rgq+1; in this case 


19991=10-1999+1. Intercepting, altering, and retransmitting the messages as 
she did above, Eve again forces Alice and Bob into a subgroup of the original 


cyclic group. This time, however, there are ten possibilities for the “secret” key. 
p-l ! p-l 7 p-l Kt 
(aa ) ACs i te (ae ye 


The cyclic subgroup of Z,,.,, generated by 3'”” is of order ten and Alice 


and Bob can only arrive at ten values for their key. Eve must wait for Alice and 
Bob to communicate with their new key and see which of the ten values Alice 
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and Bob agreed on. Once a message is intercepted, Eve can pull it offline, 
attempt each possible key, determine the key they agreed upon, and listen in on 


Alice and Bob’s communication. 


C. PRIMES OF THE FORM kg +1 


For this man-in-the-middle attack to be possible, Alice and Bob must 


agree to choose a prime of the form Rq+1. How likely is it, assuming Alice and 


Bob are using random large primes, that the prime they choose will be of the 
correct form? To answer this question, we must first count the number of primes 


p, such that p=Rq+1. We can begin with the case where R=2. This 
represents the original case in the man-in-the-middle attack, where p=2q+1. 
These particular prime numbers have their own name. A prime p is a so-called 
Sophie Germaine (SG) prime if 2p+1 is also prime. If we let z,,(t) be the 
number of SG primes not exceeding 1r, it can be demonstrated that 


t 
Tyg (t) = (a) [13] 





Now, considering the general case, if we fix R, then the number of primes p<t 


of the form p=Rq+1 is 





“°L saxon) 
#Ry(logt/ R)Y 


where ¢(t) is Euler’s Phi function [14]. However, in the attack R can range from 


2 to some bound, say B. Therefore, we must sum the cases from R=2 to 


R=B. The number of primes p such that p=Rq+1 with q prime, ranging from 


2<R<B with B<r'” is 


t 1 
= Q| ———~ —— 
[= ty” z (R) 


-of a [14] 
(log r) 
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The prime number theorem states that, if z(x)is the prime counting 


function, then ec) =1. 


Roughly speaking, this tells us that if you 
x” x / In(x) 


randomly select a number close to a large number J, the odds of it being prime 


are about 1/In(V). By the prime number theorem, it follows that lim “se <9, 
x—00 WA x 


If we let z,,,,(¢) count the number of primes of the form p= Rq+1 not exceeding 


a Xx 

t, it follows that Him “Beg as well. This tells us that, as x gets very large, 
x90 E(X 

the likelihood that a random prime number is a Sophie Germaine Prime or any 


prime of the form Rg +1 is increasingly unlikely. 


Using the prime number theorem and Big-O estimates above with a 
constant value of one, we can approximate the numbers of primes of different 
forms. Table 5 lists these approximations using scientific notation. The R value 
corresponds to different values for primes of the form p=Rq+1. The ratios listed 


are: (primes of the given form) / (total primes). 
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0-64 bits 64-128 bits 128-256 bits 
Total Primes 4.1583e17 3.8353e36 6.5255e74 
R=2 (S.G) 9.3737e15 4.3228e34 3.6775e72 
ratio: .0225 ratio: .0113 ratio: .0056 
R=100 4.316e16 1.9907e35 1.6935e73 
ratio: .1038 ratio: .0519 ratio: .0260 
R=1044 8.6335e16 3.9815e35 3.3871e73 
ratio: .2076 ratio: .1038 ratio: .0519 
R=1046 1.295e17 5.9722e35 5.0806e73 
ratio: .3114 ratio: .1557 ratio: .0779 
256-512 bits 512-1024 bits 1024-2048 bits 
Total Primes 3.778e151 2.5327e305 2.2765e613 
R=2 (S.G.) 1.0646e149 3.5683e302 1.6037e610 
ratio: .0028 ratio: .0014 ratio: .0007 
R=100 4.9024e149 1.6433e303 7.3853e610 
ratio: .0130 ratio: .0065 ratio: .0032 
R=1044 9.8049e149 3.2865e303 1.477e611 
ratio: .0260 ratio: .0130 ratio: .0065 
R=1046 1.4707e150 4.9298e303 2.2156e611 
ratio: .0389 ratio: .0195 ratio: .0097 














Table 5. 


Prime Number Approximations 





The approximations demonstrate the increasing unlikelinood of a random 
prime being of the form p=Rq+1. Using our approximations, around 64 bits 
over 30% of all primes match the form with a bound of 1046. However, when we 
consider primes around 2048 bits, the percentage drops below one. If we 
increase the bound we can increase the likelinood, but increasing the bound 


forces the attacker to search through more keys to find the correct one. 


D. COUNTERMEASURES AGAINST THE ATTACK 


To prevent this potentially fatal protocol attack, Alice and Bob have 
several options. The easiest method is to force authentication prior to the key 
exchange. Another method that prevents the attack is based on creating a prime 


order subgroup before the key exchange takes place. 
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1. Authentication 


The attack we have discussed is not the only man-in-the-middle attack 
Diffie-Hellman is vulnerable to. The Appendix details another attack, if no 
authentication occurs prior to the key exchange. To combat these attacks, a 
variation of Diffie-Hellman that ensures authentication can be used. An example 
of such a variation is the Station-to-Station protocol (STS). STS is a three-pass 
variation of the basic Diffie-Hellman protocol that allows the establishment of a 
shared secret key between two parties with mutual entity authentication and 
mutual explicit key authentication [1]. The STS employs digital signatures. A 
digital signature of a message is a number dependent on some secret known 
only to the signer; and, additionally, on the content of the message being signed 
[1]. The STS protocol is frequently employed with the RSA signature scheme. 


To employ an RSA signature scheme, public and private key pairs must 
first be generated. 


RSA signature scheme key generation steps [1]: 


i, Generate two large distinct random primes p and g, each 


roughly the same size 
2. Compute n= pq and ¢=(p-l)(q-l) 
3: Select a random integer e,l<e<¢, such that gcd(e,¢) =1 


4. Use the extended Euclidean algorithm to compute the unique 
integer d,l<d<g@ such that ed =1 (mod¢) 


0D: The user’s public key is (n,e) and the user's private key is d 


NOTE: Each user should generate a public and private key 


Now, if a user Alice wants to sign a message m, and a user Bob wants to 
verify the message signature, the remaining steps of the protocol must be 


completed. 
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RSA signature scheme protocol steps [1] 


1. Signature generation 
a. Compute m= R(m), an integer in the range [0,n-1] 
b. Compute s =m‘ modn 
Cc. Alice’s signature for m is s. 

2. Signature verification 


a. Obtain Alice’s authentic public key (n,e) 
b. Compute m= s° modn 
iC Recover m= R™'(m) 


With the knowledge of a digital signature scheme, in particular RSA, we 
can move onto the STS protocol. If we let E denote a symmetric encryption 


algorithm, and S,(m) denote Alice’s signature on m, the protocol is as follows 
[1]: 
ae Set up 
a. A prime number p and generator a of Z(2<asp-2) are 
selected and published 
D: Alice selects RSA public and private signature keys (n,,e,), 


and d, (Bob selects analogous keys). Assume each party 


has access to authentic copies of the other’s public key. 


2: Actions 
a. Alice generates a secret random x,1<x< p-—2 and sends to 
Bob a* mod p. 


A— B:a* mod p (message 1) 
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b. Bob generates a secret random y,l<y<p-2, and 
computes the shared key k=(a‘)’modp. Bob signs the 


concatenation of both exponentials, encrypts this using the 
computed key, and sends to Alice. 


B— A:a’ mod p, E,(S,(a’,a*)) (message 2) 


C. Alice computes the shared key k =(a’)* mod p, decrypts the 
encrypted data, and uses Bob’s public key to verify the 
received value as the signature on the hash of the cleartext 
exponential received and the exponential sent in message 1. 
Upon successful verification, Alice accepts that k is actually 


shared with Bob, and sends Bob an analogous message. 
A> B:E,(S,(a",a’)) (message 3) 


d. Bob similarly decrypts the received message and verifies 
Alice’s signature therein. If successful, Bob accepts that k 


is actually shared with Alice. 


The exchanged exponentials are digitally signed and retransmitted during 
the STS protocol. Therefore, Eve cannot alter the original exponentials without 
triggering a failure during Alice and Bob’s key agreement. This precludes the 
man-in-the-middle attack we have focused on and defends Alice and Bob’s key 


exchange against several other possible active man-in-the-middle attacks. 


2. Prime Order Subgroups 


Van Oorschot and Wiener [2] noticed the potentially fatal man-in-the- 
middle attack and reasoned that restricting computations to prime-order 
subgroups would prevent the attack. In this case, we will force the prime number 


p that defines the environment to be of the form p=Rqg+1, where R is a small 
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integer and gq is a large prime. Now, instead of using a generator a of Z., as 


our base for exponentiation, we compute g =a‘’’”” and let g be our new base. 
Claim: The element g generates a subgroup of order gq. 


Proof: Suppose g is of order k<q andso g‘ =1. Then a’?*“ =1. But 


k/q<1 and so (p-1)-k/q<(p-l1). This means qa is of order <(p-l), a 
contradiction because q@ is a generator of Z.,. Therefore, g must be of order 
>q. But g?=a "=a =1, so g is of order g and (g) is an subgroup of 
order gq. LU 


By using g instead of a to conduct the key exchange, Alice and Bob are 
working in a prime order subgroup instead of a group of order p—1. The man-in- 
the-middle attack we have discussed is based on forcing the parties into a 
subgroup of small order and exhaustively searching the smaller key space. 
However, by Lagrange’s theorem, the order of any subgroup must divide the 
order of the group. The order of the group generated by g is q. Therefore, any 
subgroup must be of order g or 1, because those are the only divisors of q. 
Thus, the prime order subgroup cannot be divided any further and this man-in- 


the-middle attack becomes infeasible. 


The Internet Engineering Task Force (IETF) has adopted the prime order 
subgroup tactic to prevent the type of attack we have focused on. In particular, 
Request for Comment (RFC) 2631 standardizes the technique for a particular 
Diffie-Hellman variant, based on the American National Standards Institute x9.42 
draft [15]. 


E. EXTENDING THE ATTACK TO THE N-PARTY SETTING 


The Diffie-Hellman protocol we have discussed so far has been limited to 
two parties. However, protocols have been created that extend the key 


agreement to group communications. Steiner, Tsudik and Wainer [16] defined a 
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class of natural extensions of Diffie-Hellman to the n-party setting. These 
protocols, without the countermeasures discussed above, are vulnerable to the 
man-in-the-middle attack we have focused on. We now move to demonstrate the 
attack on two of the protocols the authors describe. First, we consider the 
protocol the authors name Group Diffie-Hellman version 1 (GDH.1). In this 
section, to keep with the original notation of [16], we use set notation to mean an 
ordered tuple. 

We call the participants of the n-party key exchange {M,,M,,...M,}. As 
in the two-party case, a prime number p and a generator a of the group Z, are 
selected and published. Each member M, chooses a random secret number 
5,,0<s, < p—2. The protocol consists of two stages; upflow and downflow. 

In the upflow stage, each member makes their contribution to the shared 
key. Amember M, receives a collection of intermediate values, and has the task 
of raising the last in the list of incoming intermediate values to the power of s,. 


Then M, appends the result to the incoming set of values and forwards all to 


M,,,. As an example, M, would receive {a",a*\ from M,. M, would then 


compute a‘, append the result to the incoming message to create 


{a ae and forward to M,. 


The upflow stage is completed when M, calculates a”, which is the 
intended group key, K,. Once M, has obtained K,, the downflow stage is 
initiated. Each member M, receives i messages, one to compute K, and i-1 
to send to M,,. For example, if n=4, M, would receive {a",a",a°"'} from 
M,. First, M, would use the last value to compute K,=a’’**. Then, the 


remaining values would be raised to s, and {a‘,a*"""| would be sent to M,. 
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M, would repeat the procedure, and would send ja) to M,. The downflow 


stage is then completed when M, computes K, =a**. GDH.1 is depicted in 


Figure 4. 


Stage 1 (Upflow)}: Round @: 7 ¢ 
Vi, 


li Nala 


Stage 2 (Downflow i: Round tn 





Figure 4. GDH.1 [From 16] 


The active adversary, Eve, wishes to attack the key agreement forcing the 
n-party to agree on a key in a small subgroup of ia Like in the two-party case, if 
possible Eve must first break the prime number p down into the form p=Rq+1 
with g a large prime and R a small integer. Once completed, Eve must then 
intercept and alter two messages to complete the attack. The first message she 
must intercept is the first message sent, that is, 

M,->M,:a’. 
With a” captured, Eve computes (a")’=a™ and proceeds to send the 
computed number as the message onto M,. M, computes a” and sends 
{a",a*"\ to M,. This continues until the end of the upflow stage, when M, 
computes a“ = K,. Eve has forced K, to be one of R values, based on the 


theory of the attack described earlier in the chapter. 
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Next, Eve must intercept the first message sent during the downflow 


stage. If n=4, then 
M,->M,: {a aes : 
NOTE: Because of the alteration Eve completed in the upflow stage, only the 
first part of the message must be altered. 
Eve simply computes a, replaces the first number with the computation, 
and forwards the message to M,. The participants all arrive at K, =a", and 


the key exchange has been successfully attacked. However, in this case Eve 
had to capture and alter two very specific messages for the attack to be 
successful. In the next protocol, Eve has more flexibility. 


Next, we turn our attention to Group Diffie Hellman version 3 (GDH.3). 
GDH.3 reduces the amount of computation each party (except for M,) must 


complete, which may be very beneficial if the group size is large. The protocol 
consists of four stages. The first stage is similar to the upflow stage of GDH.1 in 
which every member contributes to the key. However, after processing the 


upflow message, M,, broadcasts a@"”’ to the entire group as the second 
stage of the process. In stage three, each M,, except M, , factors out their 
contribution (a@") from the broadcasted value and forwards the result to M,. 


After /, collects all the values from the group, in the last stage MV, raises each 


5, |ke[Ln],.k#i 


value to s, and returns the values to the group. Now each M, has qll 


and simple raises this value to s, to compute K, . 


For example, if n=5, the upflow stage completes when M, computes 


ae’*s*s Then, in stage 2, this value is broadcasted to the entire group. In stage 


3, each member other than M, factors out their contribution and forwards the 


result to M, (i.e. M, would send a”). In stage 4, M,raises each received 
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value to s, and returns the value to the sender (i.e. M, would receive a, 


Lastly, each member raises the received value to their secret number and arrives 
at K,. Figure 5 depicts GDH.3. 


1 (Upflow): Round @: 2 ¢ 





Figure 5. GDH.3 [From 16] 


It is much easier for Eve to attack GDH.3 than GDH.1. She needs only to 
intercept and alter one message, and she can choose any of the first i—2 


messages sent in the group. By raising any one of these messages to gq, M,, 
will inevitably broadcast a?’ to the group. At this point, each member factors 
out their contribution, and forwards the result to M, leaving g in the exponent of 
each message sent. M, simply raises each message to s, and returns each 
message. Therefore, qg is undisturbed, each member arrives at the same key 


K, =a*”*", and Eve has successfully forced the group into a small number of 
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possible values for the key. However, as mentioned above, if the parties agree 
to use either authentication or prime order subgroups during the key exchange, 
attacks of this sort are prevented. 
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V. RESULTS AND FUTURE WORK 


This thesis investigated and analyzed a particular man-in-the-middle 
attack on the Diffie-Hellman key exchange protocol. We created an algorithm to 
carry out the attack and demonstrated how it is constrained by the primality test 
used by the attacker. In particular, if the Miller-Rabin primalty test is used, the 


O( dog.) ) 


algorithm’s complexity is with \ being the input prime number. We 


P=Rq+1 with R bounded are common 


showed that prime numbers of the form 
with small primes but become increasingly rare as larger numbers are 
considered. In fact, with low bit primes such as 128 bits, a reasonably-sized R 
will give an attacker a good chance of the prime being of the desired form. 
However, when large primes such as 1024 and 2048 bits are considered, a very 
large value of R is required to give an attacker a reasonable chance of 
conducting the attack. We demonstrated how two techniques, authentication and 
prime order subgroups, can prevent the attack. In fact, it appears industry has 
begun to adopt the prime order subgroup technique to defend against the attack. 
Finally, we demonstrated how the attack can be expanded to include a class of 
multi-party Diffie-Hellman variants. 


Possible future efforts include coding and implementing the man-in-the- 
middle attack on active communications to test the theory laid out in this thesis. 
It is possible that analyzing the given prime number, capturing the required 
messages, altering those messages, and forwarding the messages to the 
intended recipients will be too time-consuming. This would obviously alert the 
parties of possible compromise. In addition, it may be possible to alter the attack 
to compromise communications that are authenticated and render several Diffie- 
Hellman variants such as the STS protocol vulnerable. Other future work may 
include an attempt to defeat the prime order subgroup technique. 
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APPENDIX: ANOTHER MAN-IN-THE-MIDDLE ATTACK 


This appendix details a possible man-in-the-middle attack on the Diffie- 


Hellman key exchange protocol, if no prior authentication occurs [17]. 


1) 


2) 


3) 


4) 


’) 


8) 


10) 


Alice sends her public key to Bob, but Eve intercepts it, and Bob 


never receives the key. 


Eve spoofs Alice’s identity and sends over her public key to Bob. 
Bob now thinks that he has Alice’s public key. 


Bob sends his public key to Alice, but Eve intercepts it, and Alice 


never receives the key. 


Eve spoofs Bob’s identity and sends over her public key to Alice. 
Alice now thinks that she has Bob’s public key. 


Alice combines her private key and Eve’s public key and creates 
symmetric key S1. 


Eve combines her private key and Alice’s public key and creates 


symmetric key $1. 


Bob combines his private key and Eve’s public key and creates 
symmetric key S2. 


Eve combines her private key and Bob’s public key and creates 
symmetric key S2. 


At this point, Alice and Eve share a symmetric key (S1) and Bob 
and Eve share a different symmetric key (S2). Alice and Bob think 
they are sharing a key between themselves and do not realize that 


Eve is involved. 


Alice writes a message to Bob, uses her symmetric key (S1) to 
encrypt the message, and sends it. 
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11) Eve intercepts the message and decrypts it with the symmetric key 
Si, reads or modifies the message and re-encrypts it with 


symmetric key S2, and sends it to Bob. 


12) Bob takes symmetric key S2 and uses it to decrypt and read the 


message. 


Figure 6 illustrates the attack [17]. 











Alice Bob 




















7 Eve 7 


$1 
$2 











Figure 6. Another Man-in-the-Middle Attack 
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